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Abstract 

We use the domination number of a parametrized random digraph family called proportional-edge prox- 
imity catch digraphs (PCDs) for testing multivariate spatial point patterns. This digraph family is based 
on relative positions of data points from various classes. We extend the results on the distribution of the 
domination number of proportional-edge PCDs, and use the domination number as a statistic for testing 
segregation and association against complete spatial randomness. We demonstrate that the domination 
number of the PCD has binomial distribution when size of one class is fixed while the size of the other 
(whose points constitute the vertices of the digraph) tends to infinity and asymptotic normality when 
sizes of both classes tend to infinity. We evaluate the finite sample performance of the test by Monte 
Carlo simulations, prove the consistency of the test under the alternatives, and suggest corrections for 
the support restriction on the class of points of interest and for small samples. We find the optimal pa- 
rameters for testing each of the segregation and association alternatives. Furthermore, the methodology 
discussed in this article is valid for data in higher dimensions also. 
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1 Introduction 



In statistical literature, the problem of clustering received considerable attention. The spa tial interactio n 
between two or mor e classes has important implica tions especially fo r plant species. See, e.g.. lPieloul (|196lD . 



Dixon ( 1994 . 2002a ). Stovan and Penttinen ( 2000f) . and Perry et al. ( 20061 ). Recently, a new clustering test 



based on the relative allocation of points from two or more classes has been developed. The method is based 
on a graph-theoretic approach and is used to test the spatial pattern of complete spatial randomness (CSR) 
against segregation or association. Rather than the pattern of points from one-class with respect to the 
ground, the patterns of points from one class with respect to points from other classes are investigated. CSR 
is roughly defined as the lack of spatial interaction between the points in a given study area. Segregation 
is the pattern in which points of one class tend to cluster together, i.e., form one-class clumps. On the 
other hand, association is the pattern in which the points of one class tend to occur more frequently around 
points from the other class. For convenience and generality, we call the different types of points as "classes" , 
but the class can be replaced by any characteristic of an observation at a par t icular location. For example, 
the pattern of spatial seg regation has been investigated for species (Diggle] (|2003l) ). age classes of plants 
( Hamill and Wrightl ( 19861 )) and sexes of dioecious plants ( Nanami et al.l |l999j)). 

Many methods to analyze spatial clust ering have bee n proposed in the literature (IKulldor ff (2006)). 
These include Ripley's K or L- functions (jRiplevi (|2004l )). comparison of NN distances (jDiggle] (j2003|), 
Cuzick and Edwardsj (|l990l )). and analysis of n earest neighbo r con tingenc y table s (NNC Ts) which are con- 
structed using the NN frequencies of classes (|Pieloul (|l96lf ) and iDixonl (|l994 l2002al) ). The tests (i.e., 
inference) based on Ripley's K or L-functions are only appropriate when the null pattern can be assumed 
to be the CSR in dependence patt ern, but not if the null pattern is the RL of points from an inhomogeneous 
Poisson pattern ( Kulldorfl ( 20061)). But, there are also variants of K{t) that explicitly correct for inho- 
mogeneity (see Baddelev et al.l ( 2000l) ). Cuzick and Edward's fc-NN tests are designed for testing bivariate 
spatial interaction and mostly used for spatial clust ering of cases or controls in epidemiology. Digglc's D- 
function is a modified version of Ripley's if -function (|Digglel (|2003l )) and is appropriate for the case in which 
the null pattern is the RL of points where the points are a realization from any arbitrary point pattern. 
Ripley's and Diggle's functions are designed to analyze univariate or bivariate spatial interaction at various 
scales (i.e., inter-point distances). 



In recent years, the use of mathematical graphs has also gained popularity in spatial analysis (jRoberts et al 



(l2000t )) providing a way to move beyond Euclidean metrics for spatial analysis. Although only recently in- 
troduced to landsca pe ecology, graph t h eory i s well suited to ecological applications concerned with connec- 
tivity or movement ([Minor and Urban! (|2007l )). Convention al graphs do not explicitly maintain geographic 



reference, reducing utility of other geo-spatial information. I Fall et al.l (|2007l ) introduce spatial graphs that 
integrate a geometric reference system that ties patches and paths to specific spatial locations and spatial 
dimensions thereby preserving the relevant spatial info rmation. Howev er, after a graph is constructed using 
spatial data, usually the scale is lost (see for instance, Su et al. ( 2007t )). Many concepts in spatial ecology 



depend on the idea of spatial adjacency which requires information on the close vicinity of an object. Graph 
theory conveniently can be used to express and communicate adjacency information allowing one to compute 
meaningful quantities related to spatial poi nt pattern. A dd ing vertex and edge pro perties to graphs extends 
the problem domain to network modeling (iKeittJ (120071) ). IWu and Murravl (hoOSl) propose a new measure 
based on graph theory and spatial interaction, which reflects intra-patch an d inter-patch relationships by 
quantifying contiguity within patches and potential contiguity among patches. iFriedman and Rafskvl ( 19831 ) 
also propose a graph-theoretic method to measure multivariate association, but their method is not designed 
to analyze spatial interaction between two or more classes; instead it is an extension of generalized correlation 
coefficient (such as Spearman's p or Kendall's r) to measure multivariate (possibly nonlinear) correlation. 

The graph-theoretic method we use to test spatial randomn ess is based on p roximity catch digraphs 
(PCDs) which are a special type of proximity graphs introduced bv lToussaint ( 1980l ). A digraph is a directed 
graph with vertices V and arcs (directed edges) each of which is from one vertex to another based on a binary 
relation. Then the pair (p,q) € V xV is an ordered pair which stands for an arc from vertex p to vertex q in V. 
For example, nearest neighbor ( d i ) graph which is defined b y placing an arc between each vertex and its nearest 
neighbor is a proximity digraph (|Paterson and Yaol(ll992l )). The nearest neighbor digraph has the vertex set 
V and (p, q) as an arc iff q is a nearest neighbor of p. The domination number of PCDs is first investigated for 



data in one Delaunay triangle (in R 2 ) and the analysis is generalized to data in multiple Delaunay triangles. 
Some trivial proofs are omitted and shorter proofs are given in the main body of the article. Data-random 
digraphs are directed graphs in whic h each vertex c orresp onds to a data point, and arcs are defined in terms of 
some bivariatc function on the data. iPriebe et al.l (|200lh introduced a data random digraph called class cover 
catch digraph (CCCD) in R and extended it to multiple dimensions. In this model, the vertices correspond 
to data points from a single class X and the definition of the arcs utilizes the other class y. For each x% E X 
a radius is defined as 7"j = min vl =v d(xj , y). There is an arc fr om Xj to Xj if d(xj,Xj) < r ,;; that is, the 
(open) spher e of radius Tj " catches " Xj. iDeVinnev et alj ( 2002h . Marchette and Priebe ( 2003 ). Priebe et al. 
( 2003a ). and Priebe et al. ( 2003bl ) demonstrated relatively good performance of CCCDs in classification. 
Their methods involve data reduction (condensing) by using approximate minimum dominating sets as 
prototype sets (since finding the exact minimum dominating s et is a n NP-hard problem in general — e.g., for 
CCCD in multiple dimensions — fsee IDeVinnev and Priebe] (120061)1. For the domination number of CCCDs 
for one-dim ensional data, a SLLN resu lt is proved in (jDeVinnev and Wiermanl (120031)), and this result i s 
extended by W ierman and Xiangl (|2008f ): furthermore, a CLT is also proved bv Kiang and Wiermanl (|2009l ). 
The asymptotic distribution of the dominat ion number of CCCDs for non-uniform data in R is also calculated 
in a rather general setting (jCevhanl (|2008f )). Although intuitively appealing and easy to extend to higher 
dimensions, finding the minimum dominating set of CCCD is an NP-hard problem and the distribution of 
the domination number of CCCDs is not analytically tractable fo r d > 1. This drawback ha s motivated us 

D introduced an 
also introduced 



to define new types of proximity maps. As alternatives to CCCD. Cevhan an d Priebe 
(unparametrized) type of PCDs called central similarity PCDs; Cevhan and Priebei 
another parametrized family of PCDs called proportional- edge PCDs and used the domination number of this 
PCD with a fixed parameter for testing spatial patterns. The domination number approach is appropriate 
when at least one of the classes is sufficiently lar g e. Th e relat ive (arc) densi t y of t hese PCDs are also used 
for testing the spatial patterns in (|Cevhan et all (|2006l )) and (|Cevhan et all pOO/f )). These new PCDs are 
designed to have better distri butional and mathematical properties. These new families are both applicable 
to pattern classification also. ICevhan and Priebei (120031) introduced the central similarity proximity maps 



and the associated PCDs, and Cevhan et al. (|2007l ) computed the asymptotic distribution of the relative 



(arc) density of the parametrized version o f the central similarity PCDs and applied the method to testing 
spatial patterns. ICevhan and Priebe (120051) introduced proportional-edge PCD with expansion parameter r, 
where the distribution of the domination num ber of proporti o nal-ed ge PCD with r = 3/2 is used in testing 
spatial patterns of segregation or association. Cevhan et al. ( 20061 ) computed the asy mptotic distribution 
of the relative density of the proportional-edge PCD and used it for the same purpose. ICevhan and Priebe] 
(|20071 ) derived the asymptotic distribution of the domination number of proportional-edge PCDs fo r unifo rm 
data. An extensive treatment of the PCDs based on Delaunay tessellations is available in Cevhan ( 2005F ). 

In this article, we investigate the use of the domination numb er of proportional-edge PCDs, whose asymp- 
totic distribution was computed in ( Cevhan and Priebei ( 20071 )) for testing spatial patterns of segregation 
and association. Furthermore, we extend this result for the whole range of the expansion parameter in a 
more general setting. By construction, in our PCDs, the further an X point is from y points, it will be 
more likely to have more arcs to other X points, hence the domination number will be more likely to be 
smaller. This probabilistic behavior lends the domination number as a statistic for testing spatial segregation 
or association. In addition to the mathematical tractability and applicability to testing spatial patterns and 
classification, this new family of PCDs is more flexible as it allows choosing an optimal parameter for testing 
against various types of spatial point patterns. 

We define proximity maps and the associated PCDs in Section [2 present the asymptotic distribution of 
the domination number for uniform data in one triangle and in multiple triangles in Section [3l describe the 
alternative patterns of segregation and association in Section 2J present the Monte Carlo simulation analysis 
to assess the empirical size and power performance in Section suggest an adjustment for data points from 
the class of interest which are outside the convex hull of data from the other class in Section [51 suggest a 
correction method for small sample sizes of the class of interest in Section provide an example data set 
in Section ??, and describe the extension of proportional-edge PCDs to higher dimensions in Section [8] We 
also provide the guidelines in using this test in Section [9] 



2 Proximity Maps and the Associated PCDs 



Our PCDs are based on the proximity maps which are defined in a fairly general setting. Let (fi, A4) be 
a measurable space. The proximity map N(-) is defined as N : Cl — ► 2 , where 2 is the power set of 
Q. The proximity region associated with x G f2, denoted iV(x), is the image of x € under iV(-). The 
points in N(x) are thought of as being "closer" to x G O than are the points in f2 \ N(x). Hence the term 
"proximity" in the name proximity catch digraph. The Ti-region ri(-) = T 1 (-,y) : n -> 2 n associates the 
region Ti(x) := {z G fl ! : x € A^y(z)} w ith each point x G O. Proximity maps are the building blocks of 
the proximity graphs of Toussaintl (1980); an extensive survey on proximity maps and graphs is available in 
( Jaromczvk and Toussaint ( 19921 )). 

The proximity catch digraph D has the vertex set V = \p\ , . . . , p n } ; and the arc set A is defined by 
(PitPj) £ «4 iff Pj G N(j>i) for « 7^ j. Notice that the proximity catch digraph D depends on the proximity 
map N(-) and if pj G N(pi), then we call the region N(pi) (and the point catches point Pj. Hence the 
term "catch" in the name proximity catch digraph. If arcs of the form (pi, Vj) (i.e., loops) were all o wed, D 
would have been called a pseudodigraph according to some authors (see, e.g. JChartrand and Lesniakl (|l996f )). 

In a digraph D = (V, *4), a vertex ueV dominates itself and all vertices of the form {u : (v, u) G A}. A 
dominating set Sd for the digraph D is a subset of V such that each vertex v £ V is dominated by a vertex in 
So- A minimum dominating set is a d ominating set of minimum cardinality and the domination number 
-y(D) is defined as -y(D) := (see, e.g. ILeel (Il998h) where | • | denotes the set cardinality functional. See 
Chartrand and Lesniakl (jl996l ) and lWestl <|200lh for more on graphs and digraphs. If a minimum dominating 



set is of size one, we call it a dominating point. Note that for |V| = n > 0, 1 < j(D) < n, since V itself is 
always a dominating set. 

We construct the proximity regions using two data sets X n and y m of sizes n and m from classes X and 
y, respectively Given y m C f2, the proximity map Ny(-) : — > 2 associates a proximity region Ny(x) C il 
with each point x G f2. The region iV^(x) is defined in terms of the distance between x and y m . More 
specifically, our proportional-edge proximity maps will be based on the relative position of points from X n 
with respect to the Delaunay tessellation of y m . In this article, a triangle refers to the closed region bounded 
by its edges. See Figure[T]for an example with n = 200 X points iid W((0, 1) x (0, 1)), the uniform distribution 
on the unit square and the Delaunay triangulation (which yields 13 triangles) is based on in = 10 y points 
which are also iid W((0, 1) x (0, 1)) and 77 of these X points are inside the convex hull of y points. 




Figure 1: In left, plotted is a realization of 200 X points (pluses, +) and the Delaunay triangulation based 
on 10 y points (circles, o). In right, plotted is the 77 X points which are in the convex hull of y points. 
Both X n and y m arc random samples from W((0, 1) x (0, 1)), the uniform distribution on the unit square. 



If X n = {Xi, . . . ,X n } is a set of fi-valued random variables then Ny(Xi) and Ti(Xi) are random sets. 
If Xi are iid then so are the random sets Ny(Xi). The same holds for Ti(Xi). We define the data-random 
proximity catch digraph D — associated with TVy(-) — with vertex set X n = {X\, ■ • • , X n } and arc set A by 

(XitXrfeA <=> Xj e N y (Xi). 

Since this relationship is not symmetric, a digraph is used rather than a graph. The random digraph D 
depends on the (joint) distribution of Xi and on the map Ny(-). For X n — {X\, ■ • ■ , TV„} , a set of iid random 
variables from F, the domination number of the associated data-random PCD based on the proximity map 
TV(-), denoted r ){X n ,N) 1 is the minimum number of point(s) that dominate all points in X n . The random 
variable 7(A"„,TV) depends explicitly on X n and TV(-) and implicitly on F. Furthermore, in general, the 
distribution, hence the expectation E [y(X n , TV)], depends on n, F, and N; 1 < E [7(A^ n , TV)] < n. In general 



the va riance of 7(^,1, TV) satisfies, 1 < Var [y(X n , TV)] < n 2 /4. For example, the CCCD of iPriebe et al 



(|2001l ) can be viewed as an example of PCDs and is briefly discussed in the next section. We use some of 



the properties of CCCD in R as guidelines in defining PCDs in higher dimensions. 



2.1 Spherical Proximity Maps 



Priebe et alj (|200lf ) introduced the class cover catch digraphs (CCCDs) and gave the exact and the asymp- 
totic distribution of the domination number of the CCCD based on two sets, X n and y m , which are of sizes 
n and m, from classes, X and y, respectively, and are sets of iid random variables from uniform distribution 
on a compact interval in R. 

Let y m = {yi, . . . ,y,„} C R. Then the proximity map associated with CCCD is defined as the open 
ball Ns(x) := B(x,r(x)) for all x € I , wher e r(x) := min ye ;y m d(x, y) with d(x,y) being the Euclidean 
distance between x and y ( Priebe et al.l ( 20011 )). That is, there is an arc from Xi to Xj iff there exists an 
open ball centered at Xi which is "pure" (or contains no elements) of y m in its interior, and simultaneously 
contains (or "catches") point Xj. We consider the closed ball, B(x,r(x)) for Ns(x) in this article. Then for 
x e y m , we have TVg(x) = {x}. Notice that a ball is a sphere in higher dimensions, hence the notation TV5. 
Furthermore, dependence on y m is through r(x). 

A natural extension of the proximity region TVs(x) to R d with d > 1 is obtained as N s (x) := B(x, r(x)) 
where r(x) := min yg j; m d{x, y) which is called the spherical proximity map. The spherical proximity map 
Ns(x) is well-defined for all x € R d provided that y m ^ 0. Extensions to R 2 and hig her dimensions with th e 
spherical proximity map — with applications in cla ssificat ion — are investigated by DeVinnev et al.l (|2002j) , 
iMarchette and Priebd (|2003l ). IPriebe et all (|2003al lbl). and lDeVinnev and Priebel (|2006fh 



2.2 The Proportional-Edge Proximity Maps 

First, we describe the construction of the r- factor proximity maps and regions, then state some of its basic 
properties and introduce some auxiliary tools. Note that in R the CCCDs are based on the intervals whose end 
points are from class y. li = (y(i-i) :OT , Yi-.m) for i = 0, . . . , (m+1) with yo-.m = — °o and y( 7Ti +i) : m = °o, where 
Yi-m is the i th order statistic in y m . This interval partitioning can be viewed as the Delaunay tessellation of 
K based on y m . So in higher dimensions, we use the Delaunay triangulation based on y rn to partition the 
support. 

Let y m = {yi,...,y m } be m points in general position in M. d and Tj be the i th Delaunay cell for 
i = 1, . . . , J m , where J m is the number of Delaunay cells. Let X n be a set of iid random variables from 
distribution F in R d with support S{F) C Ch (Xn) where Cjf(y m ) stands for the convex hull of y m . In 
particular, for illustrative purposes, we focus on R 2 where a Delaunay tessellation is a triangulation, provided 
that no more than three points in y m are cocircular (i.e., lie on the same circle). Furthermore, for simplicity, 
let ^3 = {yi,y2,y3} be three non-collinear points in R 2 and T(y 3 ) = T 1 (y 1 ,y 2 ,y3) be the triangle with 
vertices Let X n be a set of iid random variables from F with support S(F) C T(y 3 ). If F = U(T(y 3 )), 
a composition of translation, rotation, reflections, and scaling will take any given triangle T(y 3 ) to the basic 
triangle T b = T((0, 0), (1, 0), (ci, Ca)) with < c\ < 1/2, c 2 > 0, and (l-ci) 2 + c 2 < 1, preserving uniformity. 
That is, if X ~ U{T(y 3 )) is transformed in the same manner to, say X', then we have X' ~ U(Tb). In fact 
this will hold for any distribution F up to scale. 



For r G [l,oo], define N PE (-,M) := N(-, M;r,y 3 ) to be the (parametrized) proportional- edge proximity 
map with M-vertex regions as follows (see also Figure [2] with M = Mc and r = 2). For x G T(y 3 ) \ 3^3, let 
v(x) € 3^3 be the vertex whose region contains x; i.e., x G Rm{v(x)). In this article M-vertex regions are 
constructed by the lines joining any point M G IR 2 \ 3^3 to a point on each of the edges of T(y 3 ). Preferably, 
M is selected to be in the interior of the triangle T(y 3 )°. For such an M, the corresponding vertex regions 
can be defined using the line segment joining M to ej, which lies on the line joining yj to M; e.g., see 
Figure [3] (left) for vertex regions based on center of mass Mc, and Figure [3] (right) for vertex regions based 
on incenter Mj. With Mc, the lines joining M and 3^3 are the median lines, that cross edges at Mj for 
j = 1,2,3. M-vertex regi ons, among ma ny possibilities, can also be defined by the orthogonal projections 
from M to the edges. See ICevhanl (|2005l ) for a more general definition. The vertex regions in Figure are 
center of mass vertex regions (i.e., CM- vertex regions). If x falls on the boundary of two M-vertex regions, 
we assign v(x) arbitrarily. Let e(x) be the edge of T(y 3 ) opposite of v(x). Let £(v(x),x) be the line parallel 
to e(x) and passes through x. Let d(v(x),£(v(x),x)) be the Euclidean (perpendicular) distance from v{x) to 
£(v{x),x). For r e [l,oo), let £ r (v(x),x) be the line parallel to e(x) such that 

d(v{x),£ r (v{x),x)) — r d(v(x), £(v(x), x)) and d(£(v(x),x),£ r (v(x),x)) < d(v(x),£ r {v(x),x)). 

Let T r (x) be the triangle similar to and with the same orientation as T(y 3 ) having v(x) as a vertex and 
£ r (y(x), x) as the opposite edge. Then the proportional- edge proximity region N PE (x, M) is defined to be 
T r {x) n T(y 3 ). Notice that £(v(x),x) divides the edges of T r (x) (other than the one lies on £ r (v(x),x)) 
proportionally with the factor r. Hence the name proportional- edge proximity region. 




Figure 2: Construction of proportional-edge proximity region, N PE (x, Mc) (shaded region) for an x in the 
CM- vertex region for yi, Rm c {^i)- 

Notice that r > 1 implies x £ N r PE (x,M) for all x G T(y 3 ). Furthermore, ]hn r - ¥oa N r PE (x, M) = T(y 3 ) 
for all x G T{y 3 )\y 3l so we define N$> E (x, M) = T{y 3 ) for all such x. For iG^, we define N r PE {x, M) = {x} 
for all re [1, 00]. 

The proportional-edge PCD has vertices X n and arcs (xi,Xj) iff Xj G N PE {xi,M). See Figured] for a 
realization of X n with n = 7 in one triangle (i.e., m — 3). For r = 3/2, the number of arcs is 12 and the 

domination number / "V (x n , N r p ~ E ^ 2 ^j = 1; and for r = 5/4, the number of arcs is 9 and / "V [x n , Np^^^j = 3. 

By construction, note that as x gets closer to M (or cquivalcntly further away from the vertices in vertex 
regions), N T PE (x, M) increases in area, hence it is more likely for the outdegree of x to increase. So if more 
X points are around the center M, then it is more likely for the domination number "f(X n , N PE ) to decrease; 
on the other hand, if more X points are around the vertices 3^3, then the regions get smaller, hence it is more 
likely for the outdegree for such points to be smaller, thereby implying "i(X n , N PE ) to increase. We exploit 
this probabilistic behavior of "f(X n , N PE ) in testing spatial patterns of segregation and association. 

Note also that, N PE (x, M) can be viewed as a homothetic transformation (enlargement) with r > 1 
applied on a translation of the region N P = E 1 (x, M). Furthermore, this transformation is also an affine 
similarity transformation. 



Figure 3: The vertex regions constructed with center of mass M = Mc (left) and incenter M = Mi (right) 
using the line segments on the line joining each vertex in 3^3 to M. 



2.3 Some Auxiliary Tools Associated with PCDs 

First, notice that N PE (x, M) is similar to T(y 3 ) with the similarity ratio being equal to 

mixi(^d(y{x), e(xj) , r d(v(x), £(v(x),x)) 
d(v(x), e(x)) 

To define the Ti-region, let £i(x) be the line such that £i(x) DT (3%) 7^ and r d(yi, £,i(x)) = d(yi, £(yi, x)) for 
i = 1,2, 3. See also Figured] Then Tl(x, M) = ULi( r i<>, M ) H Rm (y*)) where rj(x, M) n iZjif (y») = 
Rhi{yi) ■ d(yi,£(yi,z)) > d(yi,£i(z)}, for i = 1,2,3. Notice that r > 1 implies a; G r^(x,M). Furthermore, 
Um r ^oo rj(x, M) = T (3%) for all x £ T (y 3 ) \ y, and so we define Tf(x, M) = T (y 3 ) for all such x. 

For Xi ~ F, with the additional assumption that the non-degenerate two-dimensional probability density 
function / exists with support(/) C T (3^3), implies that the special case in the construction of N PE — X falls 
on the boundary of two vertex regions — occurs with probability zero. Note that for such an F, N PE (x, M) 
is a triangle a.s. and T\(x,M) is a star-shaped (not necessarily convex) polygon. 

Let X e := argminjf G _y d(X, e) be the (closest) edge extremum for edge e (i.e., closest point among X n to 
edge e). Then it is easily seen that T\(X n , M) = n»=i ^li^ea M), where e, is the edge opposite vertex yj, 
for i = 1,2,3. So T r 1 (X n ,M)nR M (yi) = {ze R M (yi) : d(y it l(yi,z) > dfa, for % = 

Let the domination number be J n (r, F, M) := ~f n (X n ; F, N PE ) and := ^S^^x ex n nR M {yO e 0- 

Then 7„(r, M) < 3 with probability 1, since X n n R M (y t ) C N r PE (X [iA] ,M) for each of i = 1, 2, 3. Thus 

1 < E [ 7 „(r, F, M)] < 3 and < Var [ 7 „(r, F, M)] < 9/4. 

In T(3 7 3), drawing the lines qt(r,x) such that d(y.j,ej) = r d(yi,qi(r,x)) for i S {1,2,3} yields another 
triangle, denoted as for r < 3/2. See Figure [5] for ^ with r = \/2. 

The functional form of 2? r in the basic triangle is given by 

* = r(* 1 (r) 1 fa(r),* a (r)) = /(,,») £ TL : y > S"^; „ < „ < EiH^iUll)} (1 ) 

[ r r (1 — cij r ci J 

'(r-l)(l + ci) c 2 (r-l)\ /2-r + ci(r-l) c 2 (r-l)\ / a (2 - r) + r - 1 c 2 (2 - r) 



In the standard equilateral triangle, this functional form becomes: 

J( i{r-\) V3(r-1) \ ( Z-r yft(r - \) \ (l ^3(2 - r) 
11 2r ' 2r 1 2r ' 2r i ' I 2' r 



Figure 4: A realization oil X points generated iid U(T(y 3 )), the uniform distribution on T(y 3 ). (top left) 
and the corresponding arcs of proportional-edge PCD with M = Mc for r = 3/2 (top right) and r = 5/4 
(bottom) . 

There is a crucial difference between the triangles 3~ r and T(Mi, M2, M3). More specifically T(M\, M2, M3) C 
& s (r,M) for all M and r > 2, but («^)° and & s (r,M) are disjoint for all M and r. So if A/ G (-X) , then 
^ s (r,M) = 0; if M € then M s {r,M) = {M}; and if M .%, then & s (r,M) has positive area. 

See Figure [7] for two examples of superset regions with M that corresponds to circumcenter Mqc m this 
triangle and the vertex regions are constructed using orthogonal projections. For r = 2, note that ST r = 
and the superset region is T(M 1; M 2 , M 3 ) (see Figured] (left)), while for r = y/2, &° and ^ s (r = \/2, M)° 
are disjoint (see Figure (right)) 

The triangle X r given in Equation |T]) plays an important role in the distribution of the domination 
number of the proportional-edge PCDs. 

3 The Asymptotic Distribution of Domination Number for Uni- 
form Data 

3.1 The One- Triangle Case 

For simplicity, wc consider X points iid uniform in one triangle only. The null hypothesis we consider is a 
type of complete spatial randomness] that is, 

H : X t ~ U{T (y 3 )) for i = 1, 2, . . . , n, 

where U(T (y 3 )) is the uniform distribution on T (3%)- If it is desired to have the sample size be a random 
variable, we may consider a spatial Poisson point process on T (3^3) as our null hypothesis. Let j n (r, M) := 
7 (X n ,U(T(y 3 )), Np E ,M) be the domination number of the PCD based on N PE with X n , a set of iid random 



Figure 5: Construction of the Ti-region, T\{x,Mc) (shaded region). 




Figure 6: The triangle £T r with r = y/2 (the hatched region) 



variables from U{T{y^)), with M-vertex regions. 

We present a "geometry invariance" result for N PE (-,M) where M-vertex regions are constructed using 
the line segment joining M to edge on the line joining y; to M, rather than the orthogonal projections 
from M to the edges. This invariance property will simplify the notation in our subsequent analysis by 
allowing us to consider the special case of the (standard) equilateral triangle. 

Theorem 3.1. (Geometry Invariance Property) Suppose X n is a set of iid random variables from U(T(y 3)). 
Then for any r £ [l,oo] the distribution of "f n (r, M) is independent of y^ and hence the geometry ofT(y^). 



Proof: See lCevhan and Prie bl (l2007t) for the proof. I 

Note that geometry invariance of 7„(r = 00, M) follows trivially for all X n from any F with support in 
T(3^3) \ 3^3, since for r = 00, we have 7 n (r = 00, M) = 1 a.s. Based on Theorem 13.11 we may assume that 
T(y 3 ) is a standard equilateral triangle with y 3 = {(0, 0), (1, 0), (l/2, VS/2)} for N r PE (-, M) with M-vertex 
regions. 

Remark 3.2. Notice that, we proved the geometry invariance property for Np E {-) where M-vertex regions 
are defined with the lines joining 3^3 to M. On the other hand, if we use the orthogonal projections from 
M to the edges, the vertex regions (hence Np E ) will depend on the geometry of the triangle. That is, 
the orthogonal projections from M to the edges will not be mapped to the orthogonal projections in the 
standard equilateral triangle. Hence with the orthogonal projections, the exact and asymptotic distribution 
of 7n(T, M) will depend on c\, C2 of T{,, so one needs to do the calculations for each possible combination of 

Cl,C 2 . □ 



Figure 7: The superset regions (the shaded regions) constructed with circumcenter Mqc with r = \[2 (left) 
and r = 2 (right) with vertex regions constructed with orthogonal projections to the edges. 



The domination number 7 n (r, M) of the PCD has the following asymptotic distribution (jCevhan and Priebe 

BER(l-p r ) 



(|2007h ). As n-> oo, 

ln(r,M) 




forrG [1,3/2) and M G {ii(r), t 2 (r), t 3 (r)}, 
for r > 3/2 and M G T(^ 3 )°, 
forrG [1,3/2) and M G \ {ti(r), f 2 (r), t 3 (r)}, 



(2) 



where — ► stands for "convergence in law" and BER(p) stands for Bernoulli distribution with probability of 
success p, % and U(r) are defined in Equation JTJ), and for r G [1, 3/2) and M G {ti(r), i 2 (r), t 3 (r)}, 



OO p oo 



64 r 2 / 4r \ 

9 ( r _ 1)2 Wl W3 ex P ( 3 ( r _ fj ( w i + w 3 +2r(r- l)w 1 w 3 )\ dw 3 w 1: 



(3) 



and for r = 3/2 a nd M = Mg = {(1/ 2, V3/6)}, p r ~ 0.7413, which is not computed as in Equation ([3]); for its 
computation, see lCevhan and Priebd (|2005h . For example, for r = 5/4 and M G {h(r) = (3/10, V3/10) , t 2 (r) 
p r Rj 0.6514. See Figure [8] for the plot of the numerically computed values (i.e., the values computed by 
numerical integration) of p r as a function of r according to Equation ([3]) . Notice that in the nondegenerate 
case in ©, E [>y n (r, M)] = 3 - p r and Var [>y n (r, M)] = p r (l - p r )- 

In Equation (|2|) . the first line is referred as the non-degenerate case, the second and third lines are referred 
as degenerate cases with a.s. limits 1 and 3, respectively. 
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Table 1: The number of 7„(r, M) = k out of N = 1000 Monte Carlo replicates with M = M c and r = 2 
(left) and r = 5/4 (right). Here, "r = 2 and M = Mc" is an example of the case "r > 3/2 and M G T(y 3 )°" , 
and "r = 5/4 and M = M c " is an example of the case "r G [1,3/2) and M G ■% \ {h(r), t 2 (r), *3(r)}" . 

We also estimate the distribution of j n (r,M) for various values of n, r, and M using Monte Carlo 
simulations. At each Monte Carlo replication, we generate n points iid U(T(y 3 )) and compute the value 
of 7„(r, M). The frequencies of j n (r,M) = k out of N = 1000 Monte Carlo replicates are presented in 
Tables [U[2 andU Notice that in Table Q] (left) "r = 2 and M = M c " is an example of the case "r > 3/2 
and M G T(y 3 )°", in Table CD (right) "r = 5/4 and M = M c " is an example of the case "r G [1,3/2) and 
M G % \ {ti(r),t 2 (r),t 3 (r)Y; in Table H (top) "r = 5/4 and M = (3/5,^3/10)" is an example of the 



Figure 8: Plotted is the probability p r = linin^oo P (7,1 (7, M) = 2) given in Equation ([3]) as a function of r 
for re [1,3/2) and M £ {*i(r),ta(r),t 3 (r)}. 



r = 5/4 and M = (3/5, a/3/10 
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r = 5/4 and M = 


[7/10, V3/10) 
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Table 2: The number of 7„(r, M) = k out of TV = 1000 Monte Carlo replicates with r = 5/4 and M = 
(3/5,V3/10) (top) and M = (7/10,^3/10) (bottom). Here "r = 5/4 and M = (3/5,V^/10)" is an 
example of the case "r £ [1,3/2) and M £ 2T r \ {ti(r),t2(r),t3(r)}" with M being on the line segment 
joining ti(r) and t2(r), and "r = 5/4 and M = (7/10, \/3/10)" is an example of the case "r £ [1,3/2) and 
M £ X\{ti(r),t 2 (r),h(r)}" with M = t 2 (r). 



case "r £ [1,3/2) and M £ ,% \ {ti(r), t2(r), t3(r)}" with M being on the line segment joining ii(r) and 
t 2 (r), in Table [U (bottom) "r = 5/4 and M = (7/10, y/E/10)" is an example of the case "r G [1,3/2) and 
M £ X \ {t 1 (r),t 2 (r) ,h(r)}" with M = t 2 (r)\ and in Table H "r = 3/2 and M = M c " is an example of 



the case discussed in ( Cevhan and Priebd ( 20051 )). Notice that as the sample size n increases, the values on 



these tables get closer and closer to the expected values under their asymptotic distribution. 

Theorem 3.3. Let j n (r, M) = >y(X n ; U(T (y 3 )), N r PE , M). Then n < r 2 implies 7„(r 2 ,M) < ST 7„(n,M) 
where < ST stands for "stochastically smaller than". 

Proof: Suppose n < r 2 . Then F( 7 „(r 2 ,M) = 1) > P( 7 „(r 1; M) = 1) and F( 7 „(r 2 ,M) = 2) > 
P(7„(n,M) = 2) and P(~f„(r 2 ,M) = 3) < P(j n (n,M) = 3). Hence the desired result follows. ■ 



3.2 The Multi-Triangle Case 

In this section, wc present the asymptotic distribution of the domination number of proportional-cdgc PCDs 
in multiple Delaunay triangles. Suppose y m = {yi, y 2 , . . . , y m } C M. 2 be a set of m points in general position 
with m > 3 and no more than 3 points are cocircular. Then there arc J rn > 1 Delaunay triangles each of 
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Table 3: The number of 7 n (3/2,M c ) = k out o f N = 1000 Monte Carlo replicates. Here "r = 3/2 and 
M = Mc" is an example of the case discussed in (|Cevhan and Priebd (|2005l )). 



which is denoted as Tj (|Okabe et all (|2000h ). We wish to investigate 



H : Xi ~ U{C H {y m )) for i = 1, 2, 



(4) 



against segregation and association alternatives (see Section [4]). Figure [T3l (middle) presents a realization of 
1000 observations independent and identically distributed according to W(Cff (3> m )) for m = 10 and J m = 13. 

Let M- 7 be the point in Tj that corresponds to M in T e , be the triangle that corresponds to S? r in T e , 
and (r) be the vertices of 3fi that correspond to ti(r) in T e for i G {1, 2, 3}. Moreover, let rij := n Tj\, 
the number of X points in Delaunay triangle Tj. The digraph D is constructed using Np E (-, M 3 ) as described 
above, where the three points in y m defining the Delaunay triangle Tj are used as y m (j)- Then we have 
> J m disconnected sub-digraphs. For X n C Cff(3^m)i let r y nj (r, M 3 ) be the domination number of the digraph 
induced by vertices of Tj and X n n Tj. Then the domination number of the proportional-edge PCD in J m 
triangles is 

Jm 

ln>m {r,M) = Y J ln j { r ' Mi )- 

3=1 

See Figurc[5]for two examples of the proportional edge PCDs based on the 77 X points that are in Cji(ym) out 
of the 200 X points plotted in Figurc[TJ The arcs are constructed for M = Mc with r = 3/2 (left) and r = 5/4 
(right) and the corresponding domination number values arc 7 n .io(3/2, Mc) = 22 and 7 n ,io(5/4, Mc) = 26. 
Suppose X n is a set of iid random variables from U(CH(y m )), the uniform distribution on convex hull of y rn 
and we construct the proportional-edge PCDs using the points M J that correspond to M in T e . Then For 
fixed m (or fixed J m ), as n — > oo, so does each rij. Furthermore, as n -» oo, each component 7„ (r, M J ) 
become independent. Therefore using Equation @, we can obtain the asymptotic distribution of 7 n , TO (r, M). 
For fixed J m , as n — * oo, 

2 J m + BIN(J m ,l -jv) for M j G {*j(r),tj(r),*j(r)} and r g [1,3/2], 
J m for r > 3/2 and for aU y 3 , (5) 

3 J m for M e ^? \ |*i(r), 4 (r),^(r)} and r G [1, 3/2), 

where BIN(n,p) stands for binomial distribution with n trials and probability of success p, for r € [1, 3/2) and 
M G {ti(r) , t2(r) , ts(r)} , p r is given in Equation ([3]) and j = 1,2,..., J m . Observe that in the nondegenerate 
case in Equation ((5]), we have E \y n , m {r, M)} = J m (3 — p r ) and Var [7„ jm (r, M)] = J m p r (1 — p r ). 

Theorem 3.4. (Asymptotic Normality) Suppose rij and J m are sufficiently large with rij 3> Jm- Then 

_ 1 J m 

the asymptotic null distribution of the mean domination number (per triangle) G(r, M) := r ) nj (r, M) = 



7„, TO (r,M) . 



is approximately normal; i.e., for large rij 3> Jm 



G(r, M) 



z Af{p,a 2 /J m ) 



where fi = 3 — p r and a 2 = p r (l — Pr)/ Ji 



Figure 9: The arcs for the 77 X points (dots, .) in the convex hull of y points (circles, o) given in Figure [T] 
for the proportional-edge PCD with M = Mc for r = 3/2 (left) and r — 5/4 (right). 



Proof: For fixed J m sufficiently large and each nj sufficiently large with n = X)j 7 =i n j ^ ^mi In, {r, M) are 
approximately independent identically distributed as in Equation j2]). Then the desired result follows. ■ 

In Figure [10] (top), we plot the histograms and the approximating normal curves for G{r,M) with 
r = 3/2 and M = M c for n = fOO, fOOO, and 5000 X points generated iid U{C H {y m )) where y m (which 
yields J m = 13 triangles) is given in Figure [TJ Notice that, even though the distribution looks symmetric 
with n = 100, the normal approximation is not appropriate, since not all rij are sufficiently large to make 
the binomial distribution hold as in Equation ([5]), but as n increases (see n = 1000 and n = 5000 cases) 
the histograms and the corresponding normal curves become more similar indicating that the asymptotic 
normal approximation gets better, since all rij are sufficiently large. However, larger J m values require larger 
sample sizes in order to obtain approximate normality. With J 2 o = 30 triangles based on the Delaunay 
triangulation of 20 y points iid uniform on the unit square (not presented), we plot the histograms and the 
approximating normal curves for r ~ 3/2 and M = Mq in Figure [TOl (bottom). Observe that with more 
triangles (i.e., as J rn increases), the normal approximation gets better. We also present the histograms of 
the mean domination number and the approximating normal curves for r = 5/4 and M = (7/10, \/3/10) in 
Figure [TTJ where the trend is similar to the one in Figure [TU] (top). 

For finite n, let G(r, M) be the mean domination number (per triangle) associated with the digraph based 
on Np E . Then as a corollary to Theorem 13. 31 it follows that for r\ < r2, wc have G(r2,M) < ST G(ri,M). 



4 Alternative Patterns: Segregation and Association 

In a two class setting, the phenomenon known as segregation occurs when members of one class have a 
tendency to repel members of the other class. For instance, it may be the case that one type of plant does 
not grow well in the vicinity of another type of plant, and vice versa. This implies, in our notation, that Xi 
are unlikely to be located near any elements of y . Alternatively, association occurs when members of one 
class have a tendency to attract members of the other class, as in s ymbioti c speci es, s o that the Xj w i ll ten d 
to cluster around the elements of 3^, for example. See, for instance. iDixon (1994) and Coomes et alj (|l999f ). 



These alternatives can be parametrized as follows: In the one triangle case, without loss of generality let 
y 3 = {(0,0),(l,0),( Cl ,c 2 )} and T b = T(y 3 ) with Vl = (0,0), y 2 = (1,0), and y 3 = ( Cl ,c 2 ). For the basic 
triangle T b , let Qg := {x e T b : d(x,y 3 ) < 9} for 6 G (0, (cf + c|)/2] and S(F) be the support of F. Then 




Figure 10: Depicted in the top row are G(r = 3/2, M = M c ) ap ~° X W(> « 2.2587, a 2 /J w « .1918/ J w ) 
for Jio = 13 and n = 100 (left), n = 1000 (middle), and n = 5000 (right). In the bottom row, depicted are 
G{r = 3/2, M = M c ) aP ~° X M(jJ, « 2.2587, cr 2 /J 20 « .1918/J 20 ) for J 20 = 30 and n = 100 (left), n = 1000 
(middle), and n = 5000 (right). Histograms are based on 1000 Monte Carlo replicates and the curves are 
the associated approximating normal curves. 



consider 

,3V S := {F : S(F) C T b and P F (X G Q e ) < P V (X G Qe)} 

and 

JT A := {-F 1 : 5(F) C T b and P F (X G Q 9 ) > Pt/(X G Q 9 )} 
where Pp and Py are probabilities with respect to distribution function F and the uniform distribution on TJ,, 

respectively. So if Xj ~ F € J^s, the pattern between A" and ^ points is segregation, but if Xi ~ Fe j%a, 
the pattern between X and 3^ points is association. For example the distribution family 

&s '■— {F '■ S(F) C T;, and the associated pdf / increases as d(x,ys) increases} 

is a subset of J#s and yields samples from the segregation alternatives. Likewise, the distribution family 

'■= {F : S(F) C Tb and the associated pdf / increases as d(x,y%) decreases} 

is a subset of J%a and yields samples from the association alternatives. 

In the basic triangle, Tb, we define the Hf and with e € (0,-\/3/3), for segregation and association 
alternatives, respectively. Under , 4e 2 /3 x 100 % of the area of Tb is chopped off around each vertex 
so that the X points are restricted to lie in the remaining region. That is, for yj G 3^3, let ej denote the 
edge of Tb opposite vertex yj for j = 1,2,3, and for x G Tb let £j(x) denote the line parallel to ej through 

x. Then define Tj(e) — {x G Tb : d(yj,£j(x)) < Ej} where £\ = — 2 e 2 = — 2 and 

3Vc| + (l-ci) 2 Z^c\ + c\ 



Figure 11: Depicted are G(r = 5/4, M = (7/10, \/3/10) ap K° x Af(n « 2.3486, a 2 / J w « .2271/ Jio) for 
Jio = 13 and rt = 100 (left), n = 1000 (middle), and n = 5000 (right). Histograms are based on 1000 Monte 
Carlo replicates and the curves are the associated approximating normal curves. 




Figure 12: An example for the segregation alternative for a particular e (shaded region), and its complement 
is for the association alternative (unshaded region) on the standard equilateral triangle. 



£3 = Let T E := U - = i T i( e )- Thcn under H e we have x i ~ u ( T b \ Similarly under Hf we have 

3 - 1 
ad ( \ 

Xi ~ U (7y3/3_ e J- Thus the segregation model excludes the possibility of any Xi occurring around a yj, 

and the association model requires that all Xi occur around y^ 's. The v3/3 — e is used in the definition of 
the association alternative so that e = yields H Q under both classes of alternatives. Thus, we have the 
below distribution families under this parametrization. 

:={F:F = U{T b \ %)} and % A := {F : F = U(T^ /3 _ £ )}. (6) 

Clearly % s C M> s and ^ /3 _ £ £ Jf A , but % s & s and *^ /3 _ 8 £ 

These alterna t ives H f a nd H A with e g (O, -\ /3/3), can be transformed into the equilateral triangle as 
in (jCevhan et all (|2006h and lCevhan et all \2000 )). 

For the standard equilateral triangle, in Tj(e) — {x £ T e : d(y,£j(x)) < Ej} we have s\ — £2 = £3 = e. 
Thus iff implies X t ~ W (T e \ T e ) and i^ 4 be the model under which X t ~ W (7^3 /3 _ e ) . See Figure [TJ for 
a depiction of the above segregation and the association alternatives in T e . 

Remark 4.1. These definitions of the alternatives -ff/ and H A are given for the standard equilateral triangle. 
The geometry invariance result of Theorem 13. II still holds under the alternatives Hf and H A . In particular, 



the segregation alternative with e £ (0, v3/4) in the standard equilateral triangle corresponds to the case 
that in an arbitrary triangle, S ■ 100% of the area is carved away as forbidden from the vertices using line 
segments parallel to the opposite edge where 5 — 4e 2 (which implies 5 £ (0,3/4)). But the segregation 
alternative with e G (y/3/4, V3/3) in the standard equilateral triangle corresponds to the case that in an 
arbitrary triangle, 6 ■ 100% of the area is carved away as forbidden from each vertex using line segments 
parallel to the opposite edge where 5 = 1 — 4 (l — \^3e) (which implies S £ (3/4, 1)). This argument is for 
the segregation alternative; a similar construction is available for the association alternative. □ 



4.1 Asymptotic Distribution under the Alternatives 

Let 7„(i^,r, M), F £ Jtf° g s be the domination number under segregation. Under this alternative with M = 
Mc, the domination number will have a discrete distribution as pj := P("f n = j) for j = 1,2,3 and 
Pi + P2 + Ps = 1 ■ Clearly p^ values depend on the distribution F and their explicit forms for finite n or 
in the asymptotics are not always analytically tractable. The same holds for the domination number under 
association ^{F, r, M), F G Jt? e A . 

However, under the alternatives Hf and Hf, the asymptotic distribution of the domination number is 
much easier to find. Let 7^(e,r, M) and 7^(£,r, M) be the domination numbers under segregation and 
association alternatives, respectively. Under Hf with M = Mc, the distribution of the domination number 
is nondegenerate when r = 3/2 — e\/3/2 which implies r G (9/8, 3/2) for e £ (0, -\/3/4), and r £ (1, 9/8) for 
e £ (-\/3/4, -\/3/3). In particular, the asymptotic distribution of the domination number for uniform data in 
one triangle is as follows. As n — > oo, under Hf with M = Mq and e £ (0, V3/4), 

{2 + BER (1 - pS £ ) for r = 3/2-eV3/2, 

1 for r > 3/2, 

2 for 3/2 - eVS/2 < r < 3/2, (7) 

3 for 9/8 < r < 3/2 - eV3/2, 

where pf e can be calculated similarly as in ([3]) for fixed numeric e. 

Furthermore, as n — > oo, under iJf with M = M c and e £ (VE/4, VS/3)), 

- pf e ) for r = 3/2 - eV3/2, 

for r > 2 - V3e, , -> 
for 3/2 - eV3/2 < r < 2 Vie, (8) 
for 1< r < 3/2 - ev^/2. 

Under Hf with M = Mc, the domination number 7„ is nondegenerate when r = v / 3/(2e) which implies 
r > 2 for e £ (0, \/3/4), and e £ (3/2, 2) for e £ (v / 3/4, V3/3). In particular, the asymptotic distribution of 
the domination number for uniform data in one triangle is as follows. As n — > oo, under TJ^ 1 with M = Mc 
and e G (0, V3/3), 

r 2 + BER(l-p^ £ ) for r = V3/(2e), 
7 ^(e,r,M c ) -Ml for r > V3/(2e), (9) 

[3 for r < V3/(2e), 

where p^ £ can be calculated similarly as in ([3]) for fixed numeric e. However, for finite n, j^(e,r, Mc) is 
also nondegenerate for V3/(2e) — 1 < r < y/3/(2e). 

Under segregation with general M, suppose M £ T e \ [J ye y T(y,e) (i.e., M is in the support of X 
points under H £ ). Then for fixed r = r Q for which 7„ is nondegenerate under CSR (i.e., r a is a value 
such that M £ {ti(r ),t2(r ),t 3 (r )}), then y„ is nondegenerate under H £ if r = r a (2 — 4/ U/3e)). For 

r £ (4/3, 3/2), if M £ T e \ \J y& y e T(y, e) and e > ^ ^1 - ^-J , then j n — > 1 in probability as n — > 00; and 
the same also holds if y/3 (l J < e < — ( 1 J. 7„is nondegenerate when r — r Q (2 — 4e/\/3). For 




2 V 2r, 

general M, if £ G (0, V3/4) , then 7„ is nondegenerate when r = r (l — ej \/3) . 



Under association with general M, when M ^ U y e;y T(y, e) then 7„ is nondegenerate when r = r a (i.e., 
M is not in the support of X points under Hp). If M G ij y ey ^(V' e ) then j n is nondegenerate when 
V3(r Q - 2) 
2e(r - 1) + \/3(2r - 3) ' 

Theorem 4.2. (Stochastic Ordering) Let 7^(e, r, M) 6e i/ie domination number under the segregation al- 
ternative with e > 0. Then with Sj G (0, y/3/3), j = 1, 2, ei > £2 implies that 7^(ei, r, M) < ST 7^ (e 2; r, M). 

Proof: Note that for £1 > e 2 and finite n, P(^(si,r, M) = 1) > P(^(e 2 , r, M) = 1) and P(^(si,r, M) = 
2) > P(7„ (£2, r, M) = 2), hence the desired result follows. ■ 

Note that for Theorem 03] to hold in the limiting case when r G [1, 3/2] and A/ G {ii(?\)j £2(7*), £3(7)}, £ i G 
7i(r) and£ 2 e ^(r) should hold for i < jwhereJi(r) = ((2 - r)/V3, V^/3), 7 2 (r) = ((3 - 2r)/V5, (2 - r)/-\/3), 
and J 3 (r) = (0, (3 - 2r)/ v / 3). For £ G (0,^3/4], j^(e,r,M) -> 2 in probability as n — > 00, and for 
£ G (V3/4, n/3/3), 7,f (e, r, M) -► 1 in probability as n — > 00. 

Similarly, the stochastic ordering result of Theorem 14.21 holds for association for all £ and n < 00, with 
the inequalities being reversed. 

Notice that under segregation with e G (0, \/3/4), 7n,e(V, Mc) is degenerate in the limit except for 
r = (3 — \/3e)/2. With £ G (\/3/4, a/3/3), 7 n ,e(?'", Mc) is degenerate in the limit except for r = 3 — e/VS. 
Under association with e G (0, v / 3/4), 77i, e (7, Mc) is degenerate in the limit except for r = y/3/(2e). 

Remark 4.3. The Alternatives with Multiple Triangles: In the multiple triangle case, the segregation 
and association alternatives, H^ and Hp* with e G (0,-^/3/3), arc defined as in the one-triangle case, in 
the sense that, when each triangle (together with the data in it) is transformed to the standard equilateral 
triangle as in Theorem 13. li we obtain the same alternative pattern described above. 

Let 7^ TO (£,r, M) and 7^ m (£, r, M) be the domination numbers under segregation and association alter- 
natives in the multiple triangle case with m triangles, respectively. The extensions of their distributions from 
Equations ([7]), ([§]). and Q arc similar to the extension of the distribution of the domination number from 
one-triangle to multiple-triangle case under the null hypothesis in Section [3~^1 Furthermore, the stochastic 
ordering result of Theorem 14.21 extends in a straightforward manner. □ 



4.2 The Test Statistics and Their Distributions 

A translated form of the domination number of the PCD is a test statistic for the segregation/association 
alternative: 

f 7 „(r, M) - 2J m = In, (r, M) - 2J m if 7n (r, M) > 2 J m , 

B n ,m ■= \ , (1U) 

^ otherwise. 

Rejecting for extreme values of B n ^ m is appropriate, since under segregation we expect B n>m to be small, while 
under association we expect B n ^ m to be large. Using this test statistic the critical value for finite J m and large 
n for the one-sided level a test against segregation is given by b a , the (a)100th percentile of BIN(J m , 1 — p r ) 
(i.e., the test rejects for B n ^ m < b a ), and against association, the test rejects for B n ^ m > 

_ j ./,„ 

Similarly, the mean domination number (per triangle) of the PCD, G(r, M) := — / j 7n,- (f) M) , can 

also be used as a test statistic for the segregation/association alternative when n 3> J m and both n and 
J m are sufficiently large. Rejecting for extreme values of G(r, M) is appropriate, since under segregation we 
expect G(r, M) to be small, while under association we expect G(r, M) to be large. Using the standardized 
test statistic 

S n ,m = v / X^(G(r, M) -h)/<t, (11) 

where /1 = 3 — p r and a 2 = p r (l — p r )/Jm, the asymptotic critical value for the one-sided level a test against 
segregation is given by z a = <I> -1 (a) where $(•) is the standard normal distribution function. The test 
rejects for S n ,m < z a . Against association, the test rejects for S n ^ m > Z\- a . 



Depicted in Figure [TS] are the segregation with S = 3/16, CSR, and association with 6 = 1/4 realizations 
for to = 10 and J m = 13, and n = 1000. The associated mean domination numbers with r = 3/2 are 
2.000, 2.1538, and 3.000, for the segregation alternative, null realization, and the association alternatives, 
respectively, yielding p-valucs f» 0.000, 0.6139, and w 0.000 based on binomial approximation, and p-values 
0.0166, 0.3880, and < 0.0001 based on normal approximation. We also present a Monte Carlo power 
investigation in Section [5] for these cases. 



Figure 13: A realization of segregation (left), CSR (middle), and association (right) for \y\ = 10, Jio = 13, 
and n = 1000. 

Theorem 4.4. (Consistency-I) Let 7^ m (F,r, M) and j^ m (F,r, M) be the domination numbers under 
segregation and association alternatives in the multiple triangle case with m triangles, respectively. The test 
against segregation with F G M's which rejects for S n , m < z a and the test against association with F G J^a 
which rejects for S n ^ rn > z%- a are consistent. 

Proof: Given F G J%s- Let 7 nm (W,r, M) be the domination number for X n being a random sample from 
U(T(y 3 )). Then P(j^ i m (F,r, M) = 1) > P{ ln>m {U,r,M) = 1); P( 7 £ m (P, r, M) < 2) > P(<y£ m (W,r,M) < 
2); and P(j^ m (F, r, M) = 3) < P{^ m {U, r, M) = 3). Hence S„ )TO < with probability 1, as n > m -* 00. 
Hence consistency follows from the consistency of tests which have asymptotic normality. The consistency 
against the association alternative can be proved similarly. ■ 

Below we provide a result which is stronger, in the sense that it will hold for finite m and n — ► 00. 

Theorem 4.5. (Consistency-II) Let m (e,r, M) and j^ m (e,r,M) be the domination numbers under 
segregation and association alternatives and in the multiple triangle case with m triangles, respec- 
tively. Let J*(a,e) := ( =, ^ — ^ where [•] is the ceiling function and e- dependence is through G(r,M) 

under a given alternative. Then the test against which rejects for S n ^ m < z a is consistent for all 
e G (0, a/3/3) and J m > J*(a,e), and the test against Hf which rejects for S n ,m > z\^ a is consistent for 
all e G (0, a/3/3) and J m > J*(l - a,e). 

Proof: Let e > 0. Under H^, 7„ (e, r, M) is degenerate in the limit asm 00, which implies G(r, M) is a 
constant a.s. In particular, for e G (0, a/3/4], G(r, M) = 2 and for e G (a/3/4, a/3/3) , G(r, M) < 2 a.s. as 
n — * 00. Then the test statistic S n . m = a/ J m (G(r, M) — [A)/cr is a constant a.s. and J m > J*(a,e) implies 
that <S rl . m < z a a.s. Hence consistency follows for segregation. 

Under H^, as n — > 00, G(r,M) = 3 for all e G (0, a/3/3), a.s. Then J m > J*(l - a,e) implies that 
SV^m > 2 i-q a.s., hence consistency follows for association. ■ 

Consistency in the sense of Theorems 14.41 and 14.51 follows for P„. m similarly. 

Remark 4.6. (Asymptotic Efficiency) Pitman asymptotic efficiency (PAE) provides for an investigation 
of "local (around H ) asymptotic power" . This involves the lim it as n — > as we l l as th e limit as e — > 0. 
A detailed discussion of PAE is available in Kendall and Stuart! Jl979h and|Eeden| (|l963h . For segregation 



or ass ociation alternatives Hf and the PAE is not applicable because the Pitman conditions ( Eeden 



(|19631 )) are not satisfied by the test statistic, G(r,M). 



Hodges-Lehma nn asymptotic effici e ncy a nalysis ( Hodges and Lehmann ( 19561 )) and asymptotic power 



function analysis ( Kendall and Stuart ( 19791 )') are not applicable here either. However, when M = Mc 



(which also implies r = 3/2), for e small and n large enough, this test is very sensitive for both alternatives 
because 7^ (e, 3/2, Mc) — *■ 2 in probability asn^oo for segregation and 7^(e, 3/2, Mc) —* 3 in probability 
as n — ► 00 for association. That is, the test statistic becomes degenerate in the limit for all e > but in 
the right direction for both alternatives. On the other hand, when M 7^ Mc (i.e., r 7^ 3/2) this test is very 
sensitive for the segregation alternative since j^(e,r,M) — > 2 in probability as n ~ > 00; the same holds for 
the association alternative, but the test is not as sensitive as in the segregation case, since we only have 
y£(e,r,M) < ST j n (r,M). □ 

However, the variance of 7 n (r, M) is minimized when p r = 1/2, which happens when r ~ 1.395 (obtained 
numerically). Hence, we expect the test to have higher power under the alternatives for r around 1.40. 

Remark 4.7. The choice of the null pattern in Section 13.21 and the conditions in Theorem 13.41 seem to be 
somewhat stringent; i.e., X points are assumed to be uniformly distributed in the convex hull of y points, 
which might not be realistic in practice. However, if the supports of distributions of X and y points do 
not intersect, or mildly intersect, then it is clear that the null hypothesis is violated (i.e., two classes are 
segregated) which is easily detected by the test statistics B n>m or S n , m (see Equations l[TU]) and fTTj)) as 
they tend to be smaller under segregation than expected under CSR. When their supports have non-empty 
intersection, then either the X points are segregated from the y points, or follow CSR, or are associated 
with the y points in this intersection. Then we only consider the y points in this support intersection, then 
our inference will be local (i.e., restricted to this intersection). If one takes all of the y points, then our 
inference will be a global one (i.e., for the entire support of y points). □ 



5 Monte Carlo Simulation Analysis 
5.1 Empirical Size Analysis under CSR 

For the null pattern of CSR, we generate n X points iid IA{C hO^io)) where J^io is the set of the 10 y points 
in Figure [T3"l We calculate and record the domination number 7„(r, M) and the mean domination number 
(per triangle), G(r, M) for r = 1.00, 1.01, 1.02, . . . , 1.49 at each Monte Carlo replicate. We repeat the Monte 
Carlo procedure N mc = 1000 times for each of n — 500, 1000, 2000. Using the critical values based on the 
binomial distribution for the domination number and the normal approximation for G(r,M), we calculate 
the empirical size estimates for both right- and left-sided tests. The empirical sizes significantly smaller 
(larger) than .05 are deemed conservative (liberal). The asymptotic normal approximation to proportions is 
used in determining the significance of the deviations of the empirical sizes from .05. For these proportion 
tests, we also use a = .05 as the significance level. With N mc = 1000, empirical sizes less than .039 are 
deemed conservative, greater than .061 are deemed liberal at a = .05 level. The empirical sizes together with 
upper and lower limits of liberalness and conservativeness are plotted in Figure fT4l Observe that right-sided 
tests are liberal with being less liberal when sample size n increases, and it has about the nominal level for 
most r values between 1.1 and 1.4. The left-sided test tends to be liberal for small r, and conservative for 
large r, but has about the desired nominal level for r around 1.2 and 1.3. 

Since p r has a different form when r = 1.50, we estimate the empirical sizes for r = 1.50 separately. The 
size estimates for n = 500, 1000, and 2000 relative to segregation and association alternatives are presented in 
TablcSJ Based on the Monte Carlo simulations under CSR, the use of domination number for r G (1-45, 1.50) 
is not recommended, as the test is extremely liberal for the segregation (i.e., left-sided) alternative, while it 
is extremely conservative for the association (i.e., right-sided) alternative. This deviation from the nominal 
level for the test is due to the fact that for r € (1-45, 1.50) much larger sample sizes are required for the 
binomial and the normal approximations to hold. I nstead of r € (1.45 , 1.50) , we recommend the use of 
r = 3/2 with the asymptotic distribution provided in Cevhan and Priebel ( 2005 ). 
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Figure 14: The empirical size estimates for the left-sided alternative (i.e., relative to segregation) and the 
right-sided alternative (i.e., relative to association) with n — 500 (left), n — 1000 (middle), and n = 2000 
(right) under the CSR pattern. The empirical sizes based on the binomial distribution arc plotted in circles 
(circ) and joined with solid lines, and those based on the normal approximation are plotted in triangles (A) 
and joined with dashed lines. The horizontal lines arc located at .039 (upper threshold for conscrvativeness), 
.050 (nominal level), and .061 (lower threshold for liberalness). 



5.2 Empirical Power Analysis under the Alternatives 

To compare the distribution of the test statistic under CSR, and the segregation and association alternatives, 
we generate n points iid ZY(C#(3^m)) under CSR, iid uniformly on the support that corresponds to 
for each triangle based on the same y m points, and iid uniformly on the support that corresponds to 
^V3/2i ^ or eacn triangle based on the same y m points. Under each case, we generate n = 1000 points with 
Jio = 13 and n = 5000 points with J20 = 30 for 500 Monte Carlo replicates. The kernel density estimates 
of G(r = 3/2, M = Mc) are presented in Figures [TBI and [TBI In Figure [TBI we observe empirically that even 
under mild segregation we obtain considerable separation between the kernel density estimates under null 
and segregation cases for moderate J m and n values suggesting high power at a = .05. A similar result is 
observed for association. With J w = 13 and n = 1000, under H Q , the estimated significance level is a = .09 
relative to segregation, and a — .07 relative to association. Under , the empirical power (using the 

asymptotic critical value) is = .97, and under , (3 = 1.00. With J 20 = 30 and n = 5000, under H 01 

the estimated significance level is a = .06 relative to segregation, and a = .04 relative to association. The 
empirical power is f3 = 1.00 for both alternatives. 

We also estimate the empirical power by using the empirical critical values. With Jio = 13 and n = 1000, 
under H^^, the empirical power is f3 mc = .72 at empirical level a mc = .033 and under H^^ 21 the empirical 

power is (3 mc = 1.00 at empirical level a mc = .03. With J 2 o = 30 and n = 5000, under H^^, the empirical 

power is f3 mc = 1-00 at empirical level a mc = .034 and under the empirical power is f3 mc = 1.00 at 



empirical level a mc = .04. 




Figure 15: Two Monte Carlo experiments against the segregation alternatives with 6 = 1/16. 

Depicted are kernel density estimates of G(r = 3/2, M = Ale) for J — 13 and n = 1000 with 1000 replicates 
(left) and J20 = 30 and n — 5000 with 1000 replicates (right) under the null (solid) and segregation alternative 
(dashed). 

In Figure 1161 we observe that even in mild association we obtain considerable separation for moderate 
J m and n values suggesting high power (with J 10 = 13 and n = 1000, the empirical critical value is 2.46, 
a = .034 and empirical power is (3 = 1.0 and with J20 = 30, n = 5000, the empirical critical value is 2.36, 
a = .04 and empirical power is = 1.0). 




Figure 16: Two Monte Carlo experiments against the association alternatives H^^ 21 , i.e. 5 = 16/49. 

Depicted are kernel density estimates of G(r = 3/2, M = Mc) for J — 13 and n = 1000 with 500 replicates 
(left) and J20 = 30 and n = 5000 with 100 replicates under the null (solid) and association alternative 
(dashed). 

For the segregation alternatives, we consider the following three cases: e = V3/8,e= V3/4,e = 2^3/7 
in the 13 Delaunay triangles obtained by the 10 y points in Figurc[TJ We generate n = 500, 1000, 2000, 5000 
in the convex hull of 3^10 at each Monte Carlo replication. We estimate the empirical power of the tests 
for r = 1.00, 1.01, 1.02, 1.49 values using N mc = 1000 replicates. The power estimates based on the 
binomial distribution and normal approximation under H^. g for n = 1000, 2000, 5000 are plotted in Figure 

[TT1 Observe that the power estimates are about 1.0 for r > 1.15. Considering the empirical size and power 
estimates together, we recommend r values around 1.22 or 1.30 for the segregation alternatives. 

For the association alternatives, we consider the following three cases: £ = 5\/3/24, e = V3/12, e = V3/21 
in the 13 Delaunay triangles obtained by the 10 y points in Figurc[TJ We generate n = 500, 1000, 2000, 5000 
in the convex hull of J^io at each Monte Carlo replication. We estimate the empirical power of the tests for 
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Figure 17: The empirical power estimates under segregation with e = \/3/8,e = -\/3/4 and n = 1000 (left), 
n = 2000 (middle), and n = 5000 (right). The power estimates based on the binomial distribution are 
plotted in circles (o) and joined with solid lines, and those based on the normal approximation are plotted 
in triangles (A) and joined with dashed lines. 

r = 1.00, 1.01, 1.02, . . . , 1.49 values using N mc = 1000 replicates. The power estimates based on the binomial 
distribution and normal approximation under H^^. 24 for n = 1000, 2000, 5000 are plotted in Figure [TU 

Observe that the power estimates are about 1.0 for r > 1.33, but the power performance is poor for r 
between 1.1 and 1.33. Considering the empirical size and power estimates together, we recommend r values 
around 1.35 for the association alternatives. 

The empirical power estimates for r = 3/2 and M = Mq are presented in Table 2J 

6 Correction for X Points Outside the Convex Hull of y m 

Our null hypothesis in (j4|) is rather restrictive, in the sense that, it might not be that realistic to assume 
the support of X being Cff(3^ m ) in practice. Up to now, our inference is restricted to the C_y(3^ m ). How- 
ever, crucial information from the data (hence power) might be lost since a substantial proportion of X 
points, denoted 7r , ut , might fall outside the C^(3^m)- We investigate the effect of 7r out (or restriction to the 
CH(y m )) on our tests and propose an empirical correction to mitigate this based on an extensive Monte 
Carlo simulation study. 

We consider the following 6 cases to investigate how the removal of points outside Cff(y m ) affects the 
empirical size and power performance of the tests. We only consider r = 1.35 and r = 1.5 which have better 
size and power performances compared to others. In each case, at each Monte Carlo replication, we generate 
X n and y m independently as random samples from U(Sx) and U(Sy), respectively, for various values of n 
and m where Sx and Sy are the support sets of X and y points, respectively. We take <Sy = (0, 1) x (0, 1) 
and manipulate Sx in each case to simulate CSR and various forms of deviations from CSR. We repeat the 
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Figure 18: The empirical power estimates under association with e = 5\/3/24, e = s/3/12 and n = 1000 
(left), n = 2000 (middle), and n = 5000 (right). The power estimates based on the binomial distribution are 
plotted in circles (o) and joined with solid lines, and those based on the normal approximation are plotted 
in triangles (A) and joined with dashed lines. 



generation procedure N mc times for each combination of m and n. At each Monte Carlo replication, we 
record the number of X points outside C'H(y m ) and the domination number, 7 miTC (r). 

Case 1: In this case, we also set Sx = (0, 1) X (0, 1), 
Case 2: S x = (-8, 1 + 8) x (-8, 1 + 8) for <5 6 {.1, .25, .5}, 
Case 3: S x = (0, 1) x (0, 1 + 8) for 8 G {.1, .25, .5}, 
Case 4: S x = (0, 1) x (5, 1 + 8) for 8 6 {.1, .25, .5}. 

Case 5: Given a realization of y points, y m = {yi,y2, ■ • ■ , y m }> from U(Sy = (0, 1) x (0, 1)), S x = [{—8, 1 + 
8) x (—8,1 + 8)] \ Ui'li -^(y*' e ) wrtn ^ = = 2 Vm wn ich the expected interpoint distance in 



a homo geneous Poisson process with intensity (expected number of points per unit area) A ([Dixon 
(|2002bh ) and e = 8/k for k = 1.5, 2.0, 



Case 6: Given a realization of y points, y m = {yi, y 2 , . . . , y,„}, from U (<Sy), = (Jt=i B(yi,e) with e = 8/k, 
8 = „ h= , and fc = 1.0, 1.5. 

Notice that in Case 1 both X n and y m have the same support. By construction the two classes follow CSR 
independence with very different relative abundances (i.e., number of X points being larger than number of 
y points). In Cases 2 and 3 the support of X n contains (but larger than) the support of y m , which suggests 
segregation of X points from y points, at least when we move away from the support of y points (which is 
the unit square). However, when we restrict our attention to C#([y m ) or the unit square, we have CSR or 
CSR independence, respectively. Furthermore, the larger the 8 value, the larger the level of segregation of 



Empirical Size and Power Estimates for r — 3/2 and M = Mc 



n 


a s 


a A 




Pi 


Pi 


Pi 


Pi 


Pi 


500 


0.161 


0.062 


0.961 


1.000 


1.000 


1.000 


1.000 


0.997 


1000 


0.071 


0.082 


0.975 


1.000 


1.000 


1.000 


1.000 


1.000 


2000 


0.049 


0.081 


0.995 


1.000 


1.000 


1.000 


1.000 


1.000 



Table 4: The empirical size and power estimates for r = 3/2 and M = Mp under the null and alternatives. 
n stands for size of X points, a s for empirical size relative segregation, a A for empirical size relative to 
association, j3f , and /3§ for empirical power estimates under Hf with e = \/3/8,£ = v3/4, and e = 
2\/3/7, respectively, /3J 4 , /3^, and ^ for empirical power estimates under Hf with e = 5\/3/24,e = y/3/12, 
and e = y/3/21, respectively. 



X from In Case 4 the support of and y m overlap, but neither is a subset of the other, which suggests 
segregation between X and y points. When we restrict our attention to Cff(3^ m ), there is still segregation 
between X and y points. Furthermore, the larger the S value, the larger the level of segregation between X 
and y points. In Case 5, X points are segregated from y points both in and outside C^r(3^ m ). Furthermore, 
the larger the 6 value, the larger the level of segregation of X points from y points. Finally, in Case 6 X 
points are associated with y points. Furthermore, the smaller the 6 value, the larger the level of association 
of X points with y points. 

In Case 1 (i.e., the benchmark case), we consider n = 100, 200, . . . , 900, 1000, 2000, . . . , 9000, 10000 for 
each of m = 10, 20, . . . , 50. We generate N mc = 1000 replication for each n, to combination. In the other 
cases, we consider n — 100, 500, 1000 for m = 10 and n = 500, 1000 for m = 20; and we generate N mc = 10000 
replication for each n, m combination. 

In Cases 1-6, we estimate the proportion of X points outside the CnO^m)- For each m,n combination 
we average (over n) this proportion which is denoted as 7r out . We present the estimated (mean) proportions 
7T out for Case 1 in Table [5] and for Cases 2-6 in Table [5] Observe that in Cases 2-5, Tf out values are larger 
than that in Case 1, while in Case 6, T? ut values are smaller than that in Case 1. 



m 


10 


20 


30 


40 


50 


K ou t 


0.56 


0.37 


0.29 


0.23 


0.20 


TTfit 


0.57 


0.36 


0.28 


0.24 


0.21 



Table 5: The (mean) proportion of X points outside the C//(3^m) which is denoted as Tr ut and the fitted 
values TTfit for various m values in Case 1. 



T^out values for Case 2 


6 


0.1 


0.25 


0.50 


m= 10 


0.697 


0.806 


0.891 


m = 20 


0.566 


0.722 


0.843 



Kout values for Case 3 


6 


0.1 


0.25 


0.50 


m = 10 


0.604 


0.652 


0.740 


m = 20 


0.431 


0.499 


0.582 



Tr out values for Case 4 


S 


0.1 


0.25 


0.50 


m = 10 


0.573 


0.629 


0.782 


to = 20 


0.395 


0.488 


0.687 



Ttout values for Case 5 


k 


1.5 


2.0 


m = 10 


0.806 


0.783 


to = 20 


0.652 


0.611 



Trout values for Case 6 


k 


1.0 


1.5 


m = 10 


0.535 


0.479 


m = 20 


0.358 


0.310 



Table 6: The (mean) proportion of X points outside the C^f(3^ m ) for various S and to values in Cases 2-4 
and various k and m values in Cases 5-6. 

For Case 1, we model the relationship between TT ou t and to. Our simulation results suggest that 7r «« ~ 
1.7932/m + 1.2229/v^ro. We present the actual fitted values denoted tt fit based on this model in Table [5] 



See also Figure flUl for the plot of estimated n ou t values versus fitted values based on our model. Notice that 
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Figure 19: The proportion of X points outside Ch (3^m) as a function of m. The solid line is the fitted line 
based on Tr ou t « 1.7932/m+ 1.2229/v^n. 

Based on our Monte Carlo simulation results we propose a coefficient to adjust for the proportion of X 
points outside Cij(3^ m ), namely, 

C ch ■= 1 - (Pout - E [9 out ]) (12) 

where p out is the observed and E [7r ollt ] « 1.7932/to + 1.2229/v / m is the expected (under the conditions 
stated in Case 1) proportion of X points outside Ci^(3^ m ). For the binomial test statistic in Equation (I10p . 
we suggest 



B 



ch 



(7„(r, M) - 2J m ) ■ Crf, - (E^=i 7m W - 2 J m ) ■ C c/l if 7n (r, M) • C ch > 2J„ 



otherwise. 
For the mean domination number (per triangle) of the PCD, we suggest 



5 ... 



C c h- 



(13) 



(14) 



This (convex hull) adjustment slightly affects the empirical size estimates in Case 1, since p ou t and E [Tc ut] 
values are very similar. In Cases 2-5, there is segregation when all data points are considered, and p ou t 
values tend to be larger than E [TT out ] values, and in Case 6 (which is the simulation of the association case), 
Pout values tend to be smaller than E [ifout] values. Hence in Cases 2-6, the adjustment seems to correct the 
power estimates in the desired direction, thereby increasing the power estimates. 



7 Correction for Small Samples 

The distributional results in Equations ([2]) and ([5|) might require large n for the convergence to hold. In 
particular, it might be necessary for the number of X points per Delaunay triangle to be larger than 100 as 
a practical guide which implies very large samples from X are needed for a large number of y points. Hence 
it might be necessary to propose a correction in the test statistics for small n also. Based on our extensive 
Monte Carlo simulations (of Case 1 above) we suggest that the test statistic 5 n , m in Equation (fTTj) can be 
adjusted as S^fij n := S " , ^~°" ,T " . We provide the explicit forms of a n%m and b n m for m = 10, 20, . . . , 50 in 



Table [7J For example for m 
-8.80/(n/J TO ) 



10, S„ 



30 . 94/ V^/Vm + 9- 09/ ^JJ, 
Observe that as expected converges to S n , m 
which is a requirement in our testing framework 



in Equation (fTl| can be adjusted as 

and b, 

n as n 



S„ } n 



where a n 



= 1 - 18.81/(n/J m ) + 16.26/ ^/J^- 4.42/ ^W/T n 
■+ oo for each m value considered provided nj J m — > c 



r = 1.5 


m 


Q>n.rn 




10 


-8.80/(n/J m ) - 30.94/vVJ m + 9.09/ ^/n/J m 


1 - 18.81/(n/J m ) + 16.26/^/n/J m - 4.42/^/n/J m 


20 


10.19/(n/J m ) - 58. 15/y/n/ J m + 20.27/ {/n/J m 


1 - 11.16/(n/J m ) + 11.71/ Vn/Jm - 3.24/ </n/J m 


30 


18.72/(n/J m ) - 77 M/^Jn/ J m + 28A6/{/n/J m 


1 - 6.85/(n/J m ) + 7.56/^n/J m - 1.62/ ^n/J m 


40 


28.11/(n/J m ) - 99.66/Vn/Jm + 38.73/ {/n/J m 


1 - 5.23/ (n/J m ) + 5.81/ Vn/J m - 0.92/^n/J TO 


50 


33.37/ (n/J m ) - 115.58/^n/J m + 46.03/ {/n/J m 


1 - 3.93/(n/J m ) + 3.88/ ljn/J m + 0.03/ ^n/J m 


r = 1.35 


m 






10 


-0.13/(n/J m ) - 34.35/ Vn/J m + 8.79/ </ra/J m 


1 - 16.29/(n/J m ) + 13.43/ v/n/J m - 3.43/ </n/J m 


20 


16.05/(n/J m ) - 58.95/^/n/J™ + 18.01/{/ra/J m 


1 - 10.49/(7i/ J m ) + 10.70/ Vn/Jm - 3.04/ ^n/J m 


30 


24.22/(n/J m ) - 77.98/ l/n/J m + 25.78/ {/n/J m 


1 - 5.59/(n/J m ) + 5.52/v/n/J™ - 0.82/^/n/J m 


40 


30.66/(n/J m ) - 95.07/a/?i/J,„ + 32.91/ ^/n/J m 


1 - 4.02/(n/J m ) + 3.57/ Vn/Jm - 0.06/^/ra/J m 


50 


34.49/(n/J TO ) - 107.87/v/n/J™ + 38.18/ ^/ra/J m 


1 - 3.07/ (n/J m ) + 2.55/ Vn/J m + 0.42/ ^n/J TO 



Table 7: The finite sample adjustment for 5„. m in Equation (|TT|) as := S b a "' m with m = 10, 20, . . . , 50 
and n = 100, 200, . . . , 1000, 2000, . . . , 10000 for r = 1.5 (top) and r = 1.35 (bottom). 

8 Extension of N r PE to Higher Dimensions: 



The extension to R d for d > 2 with M = Mc is provided in ( Cevhan and Priebe I 2005f )). the extension for 



general M is similar: Let y = {yi, y2, • • ■ , Yd+i} be d + 1 non-coplanar points. Denote the simplex formed by 
these d+1 points as S(y). For r£ [1, oo], define the r-factor proximity map as follows. Given a point x in 
S(y), let Q y (M, x) be the polytope with vertices being the d {d+l)/2 points on the edges, the vertex y and x so 
that the faces of Q y (M, x) are formed by d—1 line segments each of which joining one of y points, say y.;, to M 
and that are between M and the face opposite y^. That is, the vertex region for vertex v is the polytope with 
vertices given by v and such points on the edges. Let v(x) be the vertex in whose region x falls. If x falls on the 
boundary of two vertex regions, we assign v(x) arbitrarily. Let <p(x) be the face opposite to vertex v(x), and 
rj(v{x), x) be the hypcrplanc parallel to ip(x) which contains x. Let d{v(x), rj(v(x), x)) be the (perpendicular) 
Euclidean distance from v{x) to rj(v(x), x). For r € [1, oo), let r) r (v(x), x) be the hypcrplanc parallel to <p(x) 
such that d(v(x), r) r (v(x), x)) = r d(v{x), rj(v(x), x)) and d(j){v{x), x), rj r (v(x),x)) < d(v(x),r] r (v(x),x)). Let 
S r (x) be the polytope similar to and with the same orientation as S(y) having v{x) as a vertex and r] r (v(x), x) 
as the opposite face. Then the r-factor proximity region Ny{x) := S r (x) n S(y). Also, let Q(x) be the 
hyperplane such that Q(x)(lS(y) ^ and rd(yj,Q(x)) = d(y j , r](y j , x)) for j = 1, 2, . . . , d+ 1. Then the IV 

regionis rfta?) = \jjtl^ r i(x)nR M (yj)) where rftaOniMyj) = {z E R M (yj) ■ dfo-^fo,*)) > d(y jt Cj(x)}, 

fori = 1,2, .. .,d+ 1. 

Let X v := &rgmin XeXri d(X, tp) be the closest point among X n to face p. Then it is easily seen that 
r\(X n ,M) = f|f=i TKXp^M), where p, is the face opposite vertex y l7 for i = 1, 2, . . . , d. So Y\(X n ,M) n 
^Af(y») = Rniyi) ■ d(yi, v(y%, z) > dfaSiiX^))}, for i = 1,2,..., d. 

Let the domination number be 7 n (r, f 1 , M, d) := j n (X n ; F, N PE , d) and X^i] := a,Tgmm X£XnnRM ^ y .^ d(X, tpi). 
Then 7„(r, M) < d + 1 with probability 1, since X n n i?A/(Yi) C ^p^, (-^[^1] , M) for each of i = 1,2, ... ,d. 

In <S(3>), drawing the hyper-surfaces Qi(r,x) such that d{yi,tpi) = r d(yi,Qi(r,x)) for i G {1, 2, ...,d} 
yields another polytope, denoted as £? r , for r < (d + l)/d. Let j n (r, M,d) := "f(X n , Np E , M, d) be the 
domination number of the PCD based on the extension of N PE (-,M) to W l . Then we conjecture the 
following: 

Conjecture 8.1. Suppose X n is set of iid random variables from the uniform distribution on a simplex in M. d . 
Then as n — > 00, the domination number j n (r, M, d) in the simplex satisfies 

C d + BER(1 — p r . d ) for r € [l,(d+l)/d) and M € {t x {r),t 2 {r), . . . ,t d+1 {r)}, 
y n (r,M,d) < <(d-l) for r > (d + l)/d and M € S(y)°, (15) 

[d+1 forr G [l,(d+l)/d) and M e &> r \ {ti(r),t 2 (r), . . . ,t d+1 (r)}, 



where p r ^d can be calculated by intensive numerical integration as in the calculation of Equation ([3]) and 
for r = (d + l)/d and M = Mc, Pr,d will be different from the continuous extension of Equation ([15)) . 



9 Discussion and Conclusions 



In this article, we consider the asymptotic distribution of the domination number of proportional-edge 
proximity catch digraphs (PCDs), for testing bivariate spatial point patterns of segregation and association. 
To our knowl edge the PCD-based methods are the only graph theor etic methods for testi ng spatial patterns 
in literature (|Cevhan and Priebd (|2005l ). ICevhan et all (fiofih . and ICevhan et all (|2Q07t )), The new PCDs 



when compared to class cover catch digraphs (CCCDs), have some advantages. In particular, the asymptotic 
distribution of the domination number j n (r, M) of the proportional-edge PCDs, unlike that of CCCDs, is 
mathematically tractable (although computable by numerical integration). A minimum dominating set 
can be found in polynomial time for proportional-edge PCDs in M. d for all d > 1, but finding a minimum 
dominating set is an NP-hard problem for CCCDs (except for R). These nice properties of proportional-edge 
PCDs are due to the geometry invariance of distribution of 7„(r, M) for uniform data in triangles. 

On the other hand, CCCDs are easily extendable to higher dimensions and are defined for all X n C R d , 
while proportional-edge PCDs are only defined for X n C Cjj(D^m)- Furthermore, the CCCDs based on balls 
use proximity regions that are defined by the obvious metric, while the PCDs in general do not suggest a 
metric. In particular, our proportional-edge PCDs are based on some sort of dissimilarity measure, but not 
a metric. 

The finite sample distribution of 7„(r, M), although computationally tedious, can be found by numerical 
methods, while that of CCCDs can only be empirically estimated by Monte Carlo simulations. Moreover, 
we had to introduce many auxiliary tools to compute the distribution of 7„(r, M) in K 2 . Same tools will 
work in higher dimensions, perhaps with more complicated geometry. The proportional-edge PCDs lend 
themselves for such a purpose, because of the geometry invariance property for uniform data on Delaunay 
triangles. Let the two samples of sizes n and m be from classes X and y, respectively, with X points being 
used as the vertices of the PCDs and y points being used in the construction of Delaunay triangulation. For 
the domination number approach to be appropriate, n should be much larger compared to m. This implies 
that n tends to infinity while m is assumed to be fixed. That is, the imbalance in the relative abundance of 
the two classes should be large for this method. Such an imbalance usually confounds the results of other 
spatial interaction tests. Furthermore, we can also use the normal approximation to binomial distribution 
for the domination number, provided n is much larger than m, but both sizes tending to infinity. Therefore, 
as long as n 3> m — > oo, we can remove the conditioning on m. 

The null hypothesis is assumed to be CSR of X points, i.e., the uniformness of X points in the convex hull 
of y points. Although we have two classes here, the null pattern is not the CSR independence, since for finite 
m, we condition on m and the locations of the y points are irrelevant as long as they are not co-circular. 
That is, the y points can result from any pattern that results in a unique Delaunay triangulation. When 
in — ► oo, conditioning on m does not persist. 

There are many types of parametrizations for the alternatives. The particular parametrization of the 
alternatives in Equation ([6]) is chosen so that the distribution of the domination number under the alternatives 
would be geometry invariant (i.e., independent of the geometry of the support triangles). The more natural 
alternatives (i.e., the alternatives that are more likely to be found in practice) can be similar to or might 
be approximated by our parametrization. Because in any segregation alternative, the X points will tend to 
be further away from y points and in any association alternative X points will tend to cluster around the 
y points. And such patterns can be detected by the test statistics based on the domination number, since 
under segregation (whether it is parametrized as in Section |4] or not) we expect them to be smaller, and 
under association (regardless of the parametrization) they tend to be larger. 

By construction our method uses only the X points in Cff (3^ m ) (the convex hull of y points) which might 
cause substantial data (hence information) loss. To mitigate this, we propose a correction for the proportion 
of X points outside Cjf(y m ), because the pattern inside CH{y m ) might not be the same as the pattern 
outside Cjf(3^ m ). We suggest analysis with our domination number approach in two steps: (i) analysis 
restricted to Cjy(y m ), which provides inference only for X points in Cjf(3^ m ), (ii) overall analysis with 



convex hull correction (i.e., for all X points with respect to y m ). When the number of Delaunay triangles 
based on y points, denoted J m , is less than 30, we recommend the use of binomial distribution as n — ► oo 
(i.e., for large n); when J m is larger than 30, we recommend the use of normal approximation as n — > oo. For 
small samples, one might use Monte Carlo simulation or randomization with our approach or apply a finite 
sample correction as in Section [7J In the case of small samples with some X points existing outside Cjj([V m ), 
convex hull correction can be implemented first, and then the small sample correction. Furthermore, when 
testing against segregation we recommend the parameter r sa 1.3, while for testing against association we 
recommend the parameter r ~ 1.35 as they exhibit the best performance in terms of size and power. The 
proportional-edge PCDs have applications in classification. This can be perfor med building discriminant 
regions in a manner analogous to the procedure proposed in iPriebe et al.l (|2003al ) . 
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